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Abstract — The application that motivates this paper is molec- 
ular imaging at the atomic level. When discretized at sub- 
atomic distances, the volume is inherently sparse. Noiseless 
measurements from an imaging technology can be modeled 
by convolution of the image with the system point spread 
function (psf). Such is the case with magnetic resonance force 
microscopy (MRFM), an emerging technology where imaging of 
an individual tobacco mosaic virus was recently demonstrated 
with nanometer resolution. We also consider additive white 
Gaussian noise (AWGN) in the measurements. Many prior works 
of sparse estimators have focused on the case when H has low 
coherence; however, the system matrix H in our application is 
the convolution matrix for the system psf. A typical convolution 
matrix has high coherence. The paper therefore does not assume 
a low coherence H. A discrete-continuous form of the Laplacian 
and atom at zero (LAZE) p.d.f. used by Johnstone and Silverman 
is formulated, and two sparse estimators derived by maximizing 
the joint p.d.f. of the observation and image conditioned on 
the hyperparameters. A thresholding rule that generalizes the 
hard and soft thresholding rule appears in the course of the 
derivation. This so-called hybrid thresholding rule, when used 
in the iterative thresholding framework, gives rise to the hybrid 
estimator, a generalization of the lasso. Unbiased estimates of the 
hyperparameters for the lasso and hybrid estimator are obtained 
via Stein's unbiased risk estimate (SURE). A numerical study 
with a Gaussian psf and two sparse images shows that the hybrid 
estimator outperforms the lasso. 



I. Introduction 

The structures of biological molecules like proteins and 
viri are of interest to the medical community [1]. Existing 
methods for imaging at the nanometer or even sub-nanometer 
scale include atomic force microscopy (AFM), electron mi- 
croscopy (EM), and X-ray crystallography [2], [3]. At the 
sub-atomic scale, a molecule is naturally a sparse image. That 
is, the volume imaged consists of mostly space with a few 
locations occupied by atoms. The application in particular 
that motivates this paper is MRFM [4], a technology that 
potentially offers advantages not existent in currently used 
methods. In particular, MRFM is non-destructive and capable 
of 3-d imaging. Recently, imaging of a biological sample with 
nanometer resolution was demonstrated [5]. Given that MRFM 
and indeed even AFM [6] measures the convolution of the 
image with a point spread function (psf), a deconvolution must 
be performed in order to obtain the molecular image. This 
paper considers the following problem: suppose one observes 
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a linear transformation of a sparse image corrupted by AWGN. 
With only knowledge of the linear transformation and noise 
variance, the goal is to reconstruct the unknown sparse image. 

The system matrix H is the linear transformation that, in 
the case of MRFM, represents convolution with the MRFM 
psf. Several prior works are only applicable when the system 
matrix has small pairwise correlation, i.e., low coherence or 
low collinearity [7]-[10]. Others assume that the columns of 
H come from a specific random distribution, e.g., the uniform 
spherical ensemble (USE), or the uniform random projection 
ensemble (URPE) [11]. These assumptions are inapplicable 
when H represents convolution with the MRFM psf. In 
general, a convolution matrix for a continuous psf would 
not have low coherence. Such is the case with MRFM. The 
coherence of the simulated MRFM psf used in the simulation 
study section is at least 0.557. 

The lasso, the estimator formed by maximizing the pe- 
nalized likelihood criterion with a l\ penalty on the image 
values [12], is known to promote sparsity in the estimate. 
The Bayesian interpretation of the lasso is the maximum a 
posteriori (MAP) estimate with an i.i.d. Laplacian p.d.f. on 
the image values [13]. Consider the following: given M 
i.i.d. samples of a Laplacian distribution, the expected number 
of samples equal to is zero. The Laplacian p.d.f. is more 
convincingly described as a heavy-tailed distribution rather 
than a sparse distribution. Indeed, when used in a suitable 
hierarchical model such as in sparse Bayesian learning [14], 
the Gaussian r.v., not commonly considered as a sparse dis- 
tribution, results in a sparse estimator. While using a sparse 
prior is clearly not a necessary condition for formulating a 
sparse estimator, one wonders if a better sparse estimator can 
be formed if a sparse prior is used instead. 

In [15], the mixture of a Dirac delta and a symmetric, 
unimodal density with heavy tails is considered; a sparse 
denoising estimator is then obtained via marginal maximum 
likelihood (MML). The LAZE distribution is a specific mem- 
ber of the mixture family. Going through the same thought 
experiment previously mentioned with the LAZE distribu- 
tion, one obtains an intuitive result: Mw samples equal 0, 
where w is the weight placed on the Dirac delta. Unlike 
the Laplacian p.d.f., the LAZE p.d.f. is both heavy-tailed 
and sparse. Under certain conditions, the estimator achieves 
the asymptotic minimax risk to within a constant factor [15, 
Thm. 1]. The lasso estimator can be implemented in an 
iterative thresholding framework using the soft thresholding 
rule [16], [17]. Use of a thresholding rule based on the LAZE 
prior in the iterative thresholding framework can potentially 
result in better performance. 

This paper develops several methods to enable Bayes- 
optimal nanoscale molecular imaging. In particular, advances 
are made in these three areas. 
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1) First, we introduce a mixed discrete-continuous LAZE 
prior for use in the MAP/maximum likelihood (ML) 
framework. Knowing only that the image is sparse, but 
lacking any precise information on the sparsity level, se- 
lection of the hyperparameters or regularization parame- 
ters has to be empirical or data-driven. The sparse image 
and hyperparameters are jointly estimated by maximiz- 
ing the joint p.d.f. of the observation and unknown 
sparse image conditioned on the hyperparameters. Two 
sparse Bernoulli-Laplacian MAP/ML estimators based 
on the discrete-continuous LAZE p.d.f. are introduced: 
MAPI and MAP2. 

2) The second contribution of the paper is the introduction 
of the hybrid estimator, which is formed by exclusively 
using the hybrid thresholding rule in the iterative thresh- 
olding framework. The hybrid thresholding rule is a 
generalization of the soft and hard thresholding rules. In 
order to apply this to the molecular imaging problem, it 
is necessary to estimate the hyperparameters in a data- 
driven fashion. 

3) Thirdly, SURE is applied to estimate the hyperparameter 
of lasso and of the hybrid estimator proposed above. The 
SURE-equipped versions of lasso and hybrid estimator 
are referred to as lasso-SURE and H-SURE. Our lasso- 
SURE result is a generalization of the results in [18], 
[19]. Alternative lasso hyperparameter selection methods 
exist, e.g., [20]. In [20], however, a prior is placed on 
the support of the image values that discourages the 
selection of high correlated columns of H. Since the 
H we consider has columns that are highly correlated, 
this predisposes a certain amount of separation between 
the support of the estimated image values , i.e., the 
sparse image estimate will be resolution limited. A 
number of other general-purpose techniques exist as 
well, e.g., cross validation (CV), generalized CV (GCV), 
MML [21]. Some are, however, more tractable than 
others. For example, a closed form expression of the 
marginal likelihood cannot be obtained for the Laplacian 
prior: approximations have to be made [13]. 

A simulation study is performed. In the first part, LS, oracu- 
lar LS, SBL, stagewise orthogonal matching pursuit (StOMP), 
and the four proposed sparse estimators, are compared. Two 
image types (one binary-valued and another based on the 
LAZE p.d.f.) are studied under two signal-to-noise ratio (SNR) 
conditions (low and high). MAP2 has the best performance in 
the two low SNR cases. In one of the high SNR cases, H- 
SURE has the best performance, while in the other, SBL is ar- 
guably the best performing method. When the hyperparameters 
are estimated via SURE, H-SURE is sparser than lasso-SURE 
and achieves lower l p error for p = 0, 1,2 as well as lower 
detection error Ed- In the second part of the numerical study, 
the performance of the proposed sparse estimators is studied 
across the range of SNRs between the low and high values 
considered in the first part. A 3-d reconstruction example is 
given in the third part, where the LS and lasso-SURE estimator 
are compared. This serves to demonstrate the applicability of 
lasso-SURE on a relatively large problem. 



The paper is organized into the following sections. First, 
the sparse image deconvolution problem is formulated in 
Section II. The algorithms are discussed in Section III: there 
are three parts to this section. The two MAP/ML estimators 
based on the discrete-continuous LAZE prior are derived 
in Section III-A. This is followed by the introduction of 
the hybrid estimator in Section III-B. Stein's unbiased risk 
estimate is applied in Section III-C to derive lasso-SURE and 
H-SURE. Section IV contains a numerical study comparing the 
proposed algorithms with several existing sparse reconstruc- 
tion methods. A summary of the work and future directions 
in Section V concludes the paper. 

II. Problem formulation 

Consider a 2-d or 3-d image, and denote its vector version 
by 8 G M. AI . In this paper, 8 is assumed to be sparse, viz., 
the percentage of non-zero 6% is small. Suppose that the 
measurement y G K w is given by 

y = H8 + w, where w - A/"(0,cr 2 I), (1) 

where H G R Ar><A/ is termed the system matrix, and w is 
AWGN. The problem considered can be stated as: given y, 
H, and a > 0, estimate 8 knowing that it is sparse. Without 
loss of generality, one can assume that the columns of H have 
unit I2 norm. In the problem formulation, note that knowledge 
of the sparseness of 8, viz., ||0||o, is not known a priori. 

It should be noted that, while the sparsity considered in (1) 
is in the natural basis of 8, a wavelet basis has been considered 
in other works, e.g. [19]. It may be possible to re-formulate 
(1) using some other basis so that the corresponding system 
matrix has low coherence. This question is beyond the scope 
of the paper. The emphasis here is on (1) and on sparsity in 
the natural basis. If H had full column rank, an equivalent 
problem formulation is available. Since (H'H) is invertible, 
(1) can be re-written as 

y = 8 + w, where w ~ A/"(0, ^hW)') (2) 

where y = EUy; EU = (H'H) _1 H' is the pseudoinverse of 
H; and w = ftfw is colored Gaussian noise. Deconvolution 
of 8_ from y in AWGN is therefore equivalent to denoising 
of 8 in colored Gaussian noise. In the special case that H is 
orthonormal, w is also AWGN. 

III. Algorithms 

A. Bernoulli-Laplacian MAP/ML sparse estimators 

This section considers the case when the discrete-continuous 
i.i.d. LAZE prior is used for p(8_\C), with 8_ and £ simultane- 
ously estimated via MAP/ML. For the continuous distribution, 
8, ( are obtained as the maximizers of the conditional density 

p{y,i\0> viz " 

6, C = argmax logp(y, 8\() (3) 

If £ were constant, 8 obtained from (3) would be the MAP 
estimate. If 8 were constant, the resulting £ would be the ML 
estimate. Since these two principles are at work, it cannot be 
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said that the estimates obtained via (3) are strictly MAP or 
ML. 

Recall that the LAZE p.d.f. is given by 



pie,) = (l-w)6(9 i )+wj{9 i ;a), 



(4) 



where j(x; a) = (l/2)ae _a ' ;r ' is the Laplacian p.d.f. The 
Dirac delta function is difficult to work with in the context 
of maximizing the conditional p.d.f. in (3). Consider then a 
mixed discrete-continuous version of (4). Define the random 
variables 6i and Li such that 6i = I$i, 1 < i < M. The r.v.s 
9i,Ii have the following density: 



h = 
P0i\li) = 



with probability (1 — w) 

1 with probability w 

7 (^;a) h = l ' 



(5) 
(6) 



where <?(•) is some p.d.f. that will be specified later on. It is 
assumed that (9i, Ii) are i.i.d. Ii assumes the role of the Dirac 
delta: its introduction necessitates use of the auxiliary density 
g in (6). Instead of (3), consider the optimality criterion 



£,I,C = argmax \ogp{9, I\y, Q 
LLC 



(7) 



Let Ji = {i : I t = 1} and l a = 1\ = {i : h = 0}. The 
maximization of (7) is equivalent to the maximization of 

wm-vf 



2<7 2 

imio logio ■ 



+ (M- ||I|| )log(l- W ) 



J2 lo § o ae ~ 



E 



(8) 

We propose to maximize (8) in a block coordinate-wise fash- 
ion [22] via Algorithm 1. Note that 9i = Qili- A superscript 
"(n)" attached to a variable indicates its value in the nth 
iteration. 

Algorithm 1 Block coordinate maximization of MAP criterion 
Require: 9 {a \ i (0) , e > 



n <- 
repeat 



1 



n «— n 

( (n) ^ argmax c vl/ raap (fC"" 1 ) 



S: |W , l (n) _ argmax^ * map (~9, 1, 



6: until \\e 



< e 



The p.d.f. g arises as an extra degree of freedom due to 
the introduction of the indicator variables Ii. Consider two 
cases: first, let g(x) = 7(2;; a) in (8). This will give rise to the 
algorithm MAPI. Second, let g{x) be an arbitrary p.d.f. such 
that: (1) \g{x)\ < 00 for all x € R; (2) sup g(x) is attained for 
some xel; and (3) g(x) is independent of a, w. By selecting 
g that satisfies these three properties, the algorithm MAP2 is 
thus obtained. 



1) MAPI: Let * ma pi(£L C) denote the function obtained 
by setting g(x) = j(x; a). Step (4) of Algorithm 1 is 
determined by the solution to S7^ map \ = 0. This is solved as 



M . IIIIIo 
— — and w = . 



(9) 



It can be verified that the Hessian V^V^^mapi is negative 
definite for all a > and < w < 1. Given n samples 
xi,...,x n drawn from a Laplacian p.d.f. r y(-;a), the ML 
estimate of a is «ml = n (J2"=i l^l) - ■ The estimate a in 
(9) is therefore the ML estimate of a where all of the 9iS are 
used. 

The maximization in step (5) of Algorithm 1 can be obtained 
by applying the EM algorithm [16]. Recall that EM can be 
applied using z = 9 + aw_ x as the complete data, where 

w t ~ A/"(0,a 2 I) and a < ct/||H|| 2 . Denote by § {n \z {n) the 
estimates in the ?ith EM iteration. The E-step is the Landweber 
iteration 



(n-l) 



Define the hybrid thresholding rule as 

T hy {x;t 1 ,t 2 ) = (x - sgn(x)t 2 )I(\x\ > ti 



(10) 



(ID 



where t\ and t 2 are restricted to < ti < t\. See Fig. 1. This 
is a generalization of the soft and hard thresholding rules. 
The soft thresholding rule T s (x; t) = T/, v (x; t, t), and the hard 
thresholding rule T h (x;t) = xl(\x\ > t) = T hy (x;t,0). The 



hybrid threshold 





Fig. 1. Hybrid thresholding rule. 

M-step of the EM algorithm is given by 

s(«)._ 2 -l Kl ( a ,w),aa, 2 ) 0<w< 



T hy (z\ ';aa^ 



<w<l 



(12) 



where Ki{a, w) = \/2a 2 log((l — w)/w). Recall that 9i = 
9ili. If w > 1/2, the soft-thresholding rule is applied in the 
Q-step of the EM iterations of MAPI. These iterations produce 
the lasso estimate with hyperparameter £ = 2aa 2 . However, if 
0< w < 1/2, a larger thresholding value is used that increases 
the smaller w becomes. 
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2) MAP2: From (6) and the assumptions on g, 9i ^ 
w.p. 1. Consequently, the set 

T\ = {i : h = 1} = {i :6 t ^0} w.p. 1. (13) 

This implies ||/||o = \\d_\\o W -P- 1- Apply (13) to the criterion 
to maximize, viz., (8), and denote the result by ^mapiiS., Li 0- 
One gets 



map 2 



l 

2o~ 2 



|H0-y|| 2 + (M-||/|| o )log(l- w ) 



||I||ologw + ||£|| log--a||£||i+ J2 lo S5(^)- 

{i:h=0\ 

(14) 



The maximization in step (i) is obtained by solving for 
V^m^ = 0, which produces 



m\o , . ii£ii 

— — and w = . 

leili 



(15) 



As before, one can verify that the Hessian V^Vj 1 !^ is 
negative definite for all a > and < w < 1. It is instructive 
to compare the hyperparameter estimates of MAPI vs. MAP2, 
i.e., (9) vs. (15). The main difference lies in the estimation of 
a. Assuming that the estimates 7 and 9 obey (13), one can re- 
write the MAP2 estimate a = £) i£l This is the ML 
estimate using only the 9i,i G I\, i.e., the non-zero voxels. 
On the other hand, the MAPI estimate of a can be written as 



Pil + IZbl 



(16) 



As with MAPI, the maximization in step (5) of Algorithm 1 
can be obtained by applying the EM algorithm with the 
complete data z = 9 + aw v The E-step is given by (10), 
which is the same as MAPI's E-step. Define 



g* = svpg(x) and r 



A -9* 1 ~ w 
a/2 w 



(17) 



The resulting 9 in the M-step is given by the following 
thresholding rule 



Thy(Zi '; aa 2 + n 2 (a, r), aa 2 ) r > 1 
T s (4 n) ;aa 2 ) 



< r < 1 



(18) 



where K2(a, r) = y/ 2a 2 log r, which is similar to the M- 
step of MAPI. Indeed, the M-step of MAPI can be obtained 
by setting g* = a/2. Just like in MAPI, the EM iterations 
of MAP2 produce a larger threshold the sparser the hyperpa- 
rameter w is. As well, if a is smaller, r increases. Since the 
variance of the Laplacian j{-;a) is 2/a 2 , a smaller a implies 
a larger variance of the Laplacian. Use of a larger threshold 
is therefore appropriate. 

The tuning parameter g* can be regarded as an extra degree 
of freedom that arises due to g being independent of a, w. The 
MAP2 M-step is a function of g* , and a suitable value has to 
be selected. In contrast, MAPI has no free tuning parameter(s). 



B. Hybrid thresholding rule in the iterative framework 

Define the hybrid estimator to be the estimator formed 
by using the hybrid thresholding rule (11) in the iterative 
framework [16, (24)], viz., 



f =S Thy x[9_ +(a/a) 2 U'(y-H9 { 



(19) 



where S Th ,;((x) = J2i T hy( x i'X)ei and e j £ 
1,...,M are the standard unit vectors. Due to the hybrid 
thresholding rule being a generalization of the soft threshold- 
ing rule, the hybrid estimator potentially offers better perfor- 
mance than lasso. The cost function of the hybrid estimator is 
given in Prop. 1. 



Proposition 1 Consider the iterations (19) when ||H|| 2 < 1 
and a = a. The iterations minimize the cost function 

*C. A v(£) = ||H£-y||!+5> 1 (0 i ) 

i 

where: J x (x) = I(\x\ < Ci - (2)[-(x - sgnix)^ 2 + 2C1C2] 

(20) 



+ H\x\ >Ci-C 2 )(2C 2 M+c 2 2 ) 



Proof. This is an application of Thm. 3 in Appendix I. When 
(1 = (2 = £, Ji(x) = 2(\x \ +( 2 , which gives rise to the lasso 
estimator, as expected. ■ 



C. Using SURE to empirically estimate the hyperparameters 

In this section, SURE is applied to estimate the regulariza- 
tion parameter of lasso and the hybrid estimator. Consider the 
I2 risk measure 



—Ey\\ 



(21) 



for lasso. Since 9 is not known, this risk cannot be com- 
puted; however, one can compute an unbiased estimate of the 
risk [23]. Denote the unbiased estimate by R((): ( can then be 
estimated as £ = argmin^ ef2 i?(C), where O is the set of valid 

values. When H = I, an expression for R(Q is derived in [18, 
(11)]. When H^I, however, Stein's unbiased estimate [23] 
cannot be applied to evaluate (21). In [19], the alternative I2 
risk 

1 

(22) 



R(9,C) = ^Ey\\U(9-9)\\i 



N 

is proposed instead. Equation (22) was evaluated for a diagonal 
H in [19]. 

The first theorem in this section generalizes the result of [19] 
by developing R(() for arbitrary full column rank H. The sec- 
ond theorem in this section derives (22) when 9 is the hybrid 
estimator. For this result, H is also an arbitrary full column 
matrix. If the convolution matrix can be approximated by 2d 
or 3d circular convolution, the full column rank assumption is 
equivalent to the 2d or 3d DFT of the psf having no spectral 
nulls. The proofs of the two theorems are given in Appendix II. 
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1) SURE for lasso: 

Theorem 1 Assume that the columns of H are linearly in- 
dependent, and 9 is the lasso estimator. The unbiased risk 
estimate (22) is 



1 Irr 2 - 

R(O = ^ + ^U\\l + w \\0\\ o 



(23) 



where e = y — 119 is the reconstruction error. 

Since the hyperparameter £ > 0, it can be estimated via 

C = argmin c > i?(C) (24) 

where R(C) is given in (23). LARS can be used to compute 
(24). Note that LARS requires the linear independence of the 
columns of H. The estimator 0/(C) with ( obtained via (24) 
will be referred to as lasso-SURE. 

2) SURE for the hybrid estimator: Several definitions are 
in order first. 

Definition 1 Suppose that 9 G R M has \\9\\ = M — r. 
Denote the non-zero components of 9 by Xi, 1 < i < M — r. 
The permutation matrix P(9_) G R MxM is said to order 
the zero and non-zero components of 9 if ~Pdiag(9yP' = 
diag(0, ...,0,xx,... ,x M -r)- 

Note that P in the above definition is not unique. As P is a 
permutation matrix, it is orthogonal. 

Definition 2 For a matrix A = (a^ ) G ]R <?xr , let c n be a non- 
zero sequence of length at most q s.t. 1 < c n < q. Similarly, let 
d n be non-zero sequence of length at most r s.t. 1 < d n < r. 
The submatrix A[c„,d ra ] = («y) is such that aij = a Ci _d r 

Define Af = Ci — C2 and 
u(£) a I diag[rect(^), . . . ,rect(^)] A c > ^ 

where rect(a;) = 1, |x| < 1/2 and otherwise. Recall that 
< C2 < Ci by assumption, so Ac > 0. Let G(H) = H'H 
denote the Gram matrix of H. For a given 9_, set 

Ci(0) = (PG(H)P')[r + l:M,r + l:M] and (26) 

Ca(|) = -J(PU®P')[r + 1 : M,r + 1 : M], (27) 

where P is a matrix that orders the zero and non-zero 
components of 9. 

Theorem 2 Suppose that the columns of H are linearly 
independent and that G(H) does not have an eigenvalue of 
1/2. With 9 denoting the hybrid estimator, the unbiased risk 
estimate (22) is 

1 On- 2 

R (Q = * 2 + jfhwl + — f(Ci[c x + c^- 1 ) (28) 

where e = y — H9. 

To evaluate (28) for a particular 9_, one would have to 
construct the matrix P; then, invert the (M — r) x (M — r) 



matrix (Cj + C 2 ). If 9 is sparse, (M — r) is small, and 
the inversion would not be computationally demanding. The 
optimum C is the £ G {((1,(2) ■ Ci >^ 0, < C2 < Ci} 
that minimizes R(Q. The corresponding 9j n .(() would be the 
output. This method will be referred to as Hybrid-SURE, or 
for short, H-SURE. 

IV. Simulation study 

In Section IV-B, the following classes of methods are com- 
pared: (i) least-squares (LS) and oracular LS; (ii) the proposed 
sparse reconstruction methods; and (iii) other existent sparse 
methods, viz., SBL and StOMP. 

The LS solution is implemented via the Landweber algo- 
rithm [24]. It provides a "worst-case" bound for the I2 error, 
i.e., |je|j 2 . Since the LS estimate does not take into account the 
sparsity of 9, one would expect it to have worse performance 
than estimates that do. In the oracular LS method, on the 
other hand, one knows the support of 9, and regresses the 
measurement y on the corresponding columns of H [25]. 
The oracular LS estimate consequently provides a "best-case" 
bound for the fe error; however, the oracular LS estimate is 
unimplementable in reality, as it requires prior knowledge of 
the support of 9. The second class of methods includes the two 
MAP/ML variants, MAPI and MAP2; in addition, lasso-SURE 
and H-SURE are also tested. Finally, in order to benchmark the 
proposed methods to other sparse methods, SBL and StOMP 
are included in the simulation study. The Sparselab toolbox 
is used to obtain the StOMP estimate. The CFAR and CFDR 
approaches to threshold selection are applied [11]. For CFAR 
selection, the per-iteration false alarm rate of 1/50 is used. 
For CFDR selection, the discovery rate is set to 0.5. Although 
a multitude of other sparse reconstruction methods exist, they 
are not included in the simulation study due to a lack of space. 

Two sparse images 9 are investigated in Section IV-B: a 
binary-valued image, and an image based on the LAZE prior 
(4). The binary-valued image has 12 pixels set to one, and the 
rest are zero. The LAZE image, i.e., the image based on the 
LAZE prior, can be regarded as a realization of the LAZE prior 
with a = 1 and w = 0.026. They are depicted in Fig. 2a,b 
respectively. The two images are of size 32 x 32, as is y: so, 
M = N = 1024. The matrix H, of size 1024 x 1024, Fs the 
convolution matrix for the Gaussian blur point spread function 
(psf). In order to satisfy the requirements of Thm. 1 and 2, the 
columns of H are linearly independent and G(H) does not 
have an eigenvalue of 1/2. The Gaussian blur is illustrated in 
Fig. 2c. 

The Gaussian blur convolution matrix has columns that are 
highly correlated: the coherence /i = 0.86089. Let A(9) = {i : 
9i 7^ 0}. The stability and support results of lasso all require 
that 

fi\A(9)\ < c (29) 

where c = 1/2 or 1/4 in order that some statement of 
recoverability holds [8]— [10], [25]. For a given H, (29) places 
an upper bound on |A(£)| for which recoverability of 9_ is 
assured in some fashion. With the Gaussian blur H, |A(0)| ^ 
c/0.86089 < 1 for both c = 1/4 and 1/2. Since ||0|| o = 12, 
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LAZE 



(c) Gaussian blur psf 

Fig. 2. Illustration of the two types of 9 used in the simulations; as well, 
the Gaussian blur psf is shown. 

the simulation study is outside of the coverage of existing 
recoverability theorems. 

In Section IV-C, the performance of the proposed sparse 
methods over a range of SNRs is investigated. The binary- 
valued image and Gaussian blur psf are considered in this 
section. In addition to the proposed sparse methods, the LS 
estimate is included as a point of reference. Lastly, a 3d 
MRFM example of dimension 128 x 128 x 32 is given in 
Section IV-D comparing the LS estimate and lasso-SURE. 
This serves to illustrate the computational feasibility of lasso- 
SURE for a relatively large problem. 

The proposed algorithms are implemented as previously 
outlined. The tuning parameter g* of MAP2 is set to l/v2 in 
Section IV-B and IV-C. LARS is used to compute the lasso- 
SURE estimator. H-SURE is suboptimally implemented: the 
minimizing f = £2)' is obtained via two line searches. The 
first, along the (l/\/2, l/v2) direction in the (Ci , C2) plane, 
is done using lasso-SURE. A subsequent line search in the 
(1, 0) direction is performed, i.e., (2 is kept constant and £1 is 
increased. Define the SNR as SNR = (AT- 1 ||H£|| 2 )/cr 2 , and 
the SNR in dB as SNR dB = 101og 10 SNR. 

A. Error criteria 

Recall that the reconstruction error e = — 0. Several error 
criteria are considered in the performance assessment of a 
sparse estimator. 

. ||e|| p for p = 0,1, 2. 

> The detection error criterion defined by 

M 

E d (0,h6) 4 £ 11(0, = 0) - 7(10*1 < S)\ (30) 

Values of 0, such that 1 0, | < 6 are considered equivalent 
to 0. This is used to handle the effect of finite-precision 
computing. More importantly, it addresses the fact that, 



to the human observer, small non-zero values are not 
discernible from zero values. In the study, S = lO^H^Hoo 
is selected. This error criterion is effectively a 0-1 penalty 
on the support of 0. Accurately determining the support 
of a sparse is more critical than its actual values [7], 
[26]. 

• The number of non-zero values of 0, i.e., ||0|| o . One 
would like ||0||o ~ WiWo, which is small if is indeed 
sparse. 

B. Performance under low and high SNR 

The performance of the estimators is given in Table I for 
the binary-valued with the SNR equal to 1.76 dB (low 
SNR) and 20 dB (high SNR). The number reported in Table I 
is the mean over the simulation runs. For each performance 
criterion, the best mean number is underlined. The oracular 
LS estimate is excluded from this assessment, as it cannot be 
implemented without prior knowledge. In terms of ||0||o, the 
best number is the value closest to ||0||o- Recall that for the 
binary-valued image 0, ||0||o = 12. The best number for the 
other performance criterion is the value closest to 0. 

TABLE I 

Performance of the reconstruction methods for the 
binary- valued 6. 





Error criterion 


Method 


kilo 


kill 


Wzh 


E d (e,e) 


\m\o 






SNR = 


1.76 dB 






Oracular LS 


12 


0.880 


0.309 





12 


LS 


1024 


579 


22.6 


1.00 x10 s 


1024 


SBL 


1024 


13.8 


2.35 


58.7 


1024 


StOMP CFAR 


335 


1.46X10 4 


1.50x10 s 


322 


335 


StOMP CFDR 


454 


7.95 xlO 4 


7.17x10 s 


442 


454 


MAPI 


12 


12 


3.46 


12 





MAP2 


15.5 


2.72 


0.912 


3.68 


15.3 


lasso-SURE 


60 


7.83 


1.51 


44.2 


60.6 


H-SURE 


39.3 


7.25 


1.51 


27.0 


39.3 






SNR = 


= 20 dB 






Oracular LS 


12 


0.112 


0.0394 





12 


LS 


1024 


86.1 


3.67 


929 


1024 


SBL 


1024 


1.19 


0.184 


32.2 


1024 


StOMP CFAR 


377 


4.33 xlO 3 


457 


361 


377 


StOMP CFDR 


459 


1.36X10 4 


1.19x10 s 


446 


459 


MAPI 


43.9 


1.07 


0.209 


22.9 


43.9 


MAP2 


230 


3.82 


0.380 


114 


230 


lasso-SURE 


61.2 


0.923 


0.176 


15.7 


61.8 


H-SURE 


22.0 


0.584 


0.152 


2A 


22.0 



In the low SNR case, MAP2 has the best performance. 
MAPI consistently produces the trivial estimate of all zeros, 
as evidenced by the mean value of ||0||o being equal to 0. The 
trivial all-zero estimate results in ||ej| p = ||0|| p for p = 0,1, 2. 
For a sparse 0, a small ||e|jo therefore is not necessarily an 
indicator of good performance. A second comment regarding 
||0||o is that it does not always give an accurate assessment 
of the perceived sparsity of the reconstruction. In Table I, 
SBL never produces a strictly sparse estimate, as the mean 
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|(9||o equals the maximal value of 1024. However, consider 
Fig. 3a, where the SBL estimate for one noise realization at 
an SNR of 1.76 dB is depicted. The 9 looks sparser than 
would be suggested by \\9]\o = 1024. This is because many 
of the non-zero pixel values have a small magnitude, and are 
visually indistinguishable from zero. The SBL estimate has 
many spurious non-zero pixels, in addition to blurring around 
several non-zero pixel locations. Negative values are present 
in the reconstruction, although the binary 9 is non-negative. 



TABLE II 

Performance of the reconstruction methods for the LAZE 0. 





(a) SBL 




(b) STOMP (CFAR) 



■ 

■ 



(c) MAP2 (g 



(d) lasso-SURE 



Fig. 3. Reconstructed images for the binary-valued 9 under an SNR of 1.76 
dB for SBL, StOMP (CFAR), MAP2 (g* = 1/V2), and lasso-SURE. 

The StOMP CFAR , MAP2, and lasso-SURE estimate are 
illustrated in Figs. 3b-d respectively. The StOMP CFAR 9 has 
large positive and negative values. It does not seem like a 
sufficient number of stages have been taken. While blurring 
around several non-zero voxels are evident in the MAP2 
estimate, 9_ closely resembles 9_, cf. Fig. 2a. None of the 
estimators considered here take into account positivity. From 
Fig. 3b, however, one sees that the MAP2 estimate has no 
negative values. Qualitatively, the lasso-SURE estimate looks 
better than SBL, but worse than MAP2. This is reflected in 
the quantitative performance criteria in Table I. 

In the high SNR case, H-SURE has the best performance. 
The mean values of all the performance criteria decrease as 
compared to lasso-SURE. The greatest decreases are in [|e||o, 
Ed, and \\9\\q. They indicate that the H-SURE estimator is 
properly zeroing out spurious non-zero values and producing 
a sparser estimate than lasso-SURE. However, this comes at 
a price of higher computational complexity. 

Examine next the performance of the reconstruction meth- 
ods with the LAZE image. One expects MAPI and MAP2 
to have better performance than the other methods, as the 
image 9 is generated using the LAZE prior. The numbers 
for the performance criteria are given in Table II. Again, the 
reconstruction method with the best number for each criterion 
is underlined. For the LAZE 6, \\9\\ = 27. 

In the low SNR case, MAP2 has the advantage. MAPI 
produces the trivial estimate of all zeros, just as in the 





Error criterion 


Method 


kilo 


yii 


\kh 


E d {0,0) ||fl||o 


SNR = 1.76 dB 


Oracular LS 


27 


5.71 


1.55 


0.56 


27 


LS 


1024 


807 


31.6 


977 


1024 


SBL 


1024 


28.1 


3.99 


72.6 


1024 


StOMP CFAR 


264 


4.37 x 10 3 


558 


244 


257 


StOMP CFDR 


409 


1.62 x 10 4 


1.65x 10 3 


386 


405 


MAPI 


27 


21.2 


5.21 


27 





MAP2 


30.9 


17.5 


3.98 


25.1 


9.77 


LASSO-SURE 


92.6 


20.3 


3.15 


69.3 


81.9 


H-SURE 


67.2 


19.1 


3.14 


51.1 


54.7 






SNR = 


20 dB 






Oracular LS 


27 


0.686 


0.190 


0.6 


27 


LS 


1024 


122 


5.34 


856 


1024 


SBL 


1024 


4.32 


0.814 


33.7 


1024 


StOMP CFAR 


336 


1.00 xlO 4 


110 


305 


330 


StOMP CFDR 


438 


2.67 xlO 4 


250 


408 


435 


MAPI 


69.7 


6.53 


1.34 


31.9 


63.8 


MAP2 


216 


10.8 


1.44 


86.6 


212 


LASSO-SURE 


119 


6.63 


1.32 


31.1 


116 


H-SURE 


84.4 


6.73 


1.35 


33.0 


78.7 



case of the binary-valued 6. The high SNR case has mixed 
results. While SBL has the best mean ||e||i and ||e||2, the 
best result for the other three criteria each occur at a different 
method. The fact that MAPI and MAP2 did not produce 
superior performance over the other methods in the case 
of the LAZE image is unintuitive. As the SNR increases, 
however, the hyperparameter estimates become biased [27]. 
The other unintuitive result is that Ed(0_,(t, S) for the oracular 
LS estimate is not zero. This arises because of the choice of 
5. Since S = 10~ 2 ||#||oc, the values of Bi that are smaller than 
5 in absolute value are thresholded to zero. This results in a 
non-zero Ed in some cases. 

C. Performance vs. SNR of the proposed reconstruction meth- 
ods 

The performance of the proposed reconstruction methods 
when applied to the binary-valued 9 is examined with respect 
to SNR. The intent in this subsection is to study the behavior 
of the proposed methods at SNR values in between the low 
and high values of 1.76 dB and 20 dB respectively. As with 
the previous section, the MAP2 estimator is used with g* = 
1 / y/2. For each estimator, the mean is plotted along with error 
bars of one standard deviation. The error plots are given in 
Fig. 4. Note that in Fig. 4e, the MAPI curve is missing the 
first several SNR values because \\9\\o = and the y-axis is 
in a log scale. 

First, consider the ||e||o, ||e||i, and ||e||2 error criteria. MAPI 
is unable to distinguish the location of the non-zero pixels in 
low SNR. Under high SNR conditions, it has performance 
that is comparable to lasso-SURE and H-SURE in terms of 
the II ell i and llelk errors. The value of ||e|jo increases with 
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(b) kill 
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(d) E d 




(e) ||0||o 

Fig. 4. Performance vs. SNR for Landweber iterations, MAPI, MAP2, lasso- 
SURE, and H-SURE when applied to the binary-valued 8. 



respect to increasing SNR for MAPI. Taken together with 
the ||e||i and ||e||2 curves, the trend is indicative of small 
non-zero coefficients appearing in 9_ that are spurious. MAP2 
also has the same behavior with respect to ||e||o; however, a 
performance gap under high SNR exists in its |je||i and ||e||2 
curves as compared to MAPI, lasso-SURE, and H-SURE. The 
lasso-SURE and H-SURE estimates have curves that decrease 
as the SNR increases. H-SURE's error curve is lower than 
lasso-SURE's for ||e||o and ||e||i, and it is almost identical for 

kk. 

Consider next the Ed and ||0||o error criterion. The lasso- 
SURE curve for \\6}\o is relatively fiat, and its Ed curve 
decreases for high SNR. This indicates that, while the number 
of non-zero coefficients in 9 remains the same, the amplitude 
at the spurious locations are decreasing. With MAPI and 
MAP2, the opposite trend is true. For low SNR, the number of 
non-zero coefficients in 9 is small, but increases with higher 
SNR. A similar increase can be seen in the Ed curves. One 
can conclude that the number of spurious non-zero locations 
is increasing. This phenomenon is due to the bias of the 
hyperparameter estimates [27]. With H-SURE, both the Ed 
and k||o curves decrease as the SNR increases. This behavior 
is intuitive, as higher SNR should result in better performance. 
We note that H-SURE's Ed curve is lower than lasso-SURE's; 
moreover, H-SURE's ||0||o curve is closer to \\6\\o = 12 than 



lasso-SURE's. 

D. MRFM reconstruction example 

A three dimensional example using the hydrogen atom 
locations of the DNA molecule (PDB ID: 103D) [28] as 9 
and the 3d MRFM psf is carried out in this subsection. Both 
9 and y have dimension 128 x 128 x 32, and the SNR is 
4.77 dB. Each hydrogen location in 9 is set to 1, and the 
rest of the locations set to 0. The resulting image 9_ has a 
helical structure: see Fig. 5a. The image represented by H9 
is illustrated in Fig. 5b. The LS and lasso-SURE estimates 




(a) Image 



(b) Image H9 



Fig. 5. Image 8 and noiseless projection H8 used in the MRFM reconstruc- 
tion example. 

are given in Fig. 6 and 7 respectively. The 3d figures plot 




120 



Fig. 6. The LS estimate of the MRFM example under a SNR of 4.77 dB. 

contours for several values. The white volume in Fig. 6 does 
not indicate 9i = 0; rather, the 9i are at a value smaller 
than the lowest color bar value. On the other hand, the white 
volume of the lasso-SURE estimate is mostly 9i = 0. The 
histogram of 9i for the LS and lasso-SURE estimator given in 
Fig. 8a,b respectively illustrate this point. The sharp peak at 
in the lasso-SURE histogram suggests that the lasso estimator 
incorporates a thresholding rule, which it does. The 9i values 
are separated into two distinct sets: the sparse image centered 
around 0.95 and the background around 0. In contrast, the 
histogram of 9 L for Landweber is not separated in this fashion, 
nor does it have a sharp peak at 0. 

V. Summary and future directions 

Use of a mixed discrete-continuous LAZE prior and jointly 
estimating (9_, £) as the maximizer of p(y, 9\Q gives rise to the 
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Fig. 7. The lasso-SURE estimate of the MRFM example under a SNR of 
4.77 dB. 




(a) LS (b) lasso-SURE 



Fig. 8. Histogram of 9i for the LS and lasso-SURE estimator. 

Bernoulli-Laplacian sparse estimators MAPI and MAP2. The 
hybrid thresholding rule is observed in both of these sparse 
estimators. When used in the iterative thresholding framework, 
the resulting penalty on 9 is quadratic around the origin, 
and linear away from the origin, cf. (20). In order to apply 
lasso and the hybrid estimator to data, an empirical means of 
estimating the hyperparameters is required. This is achieved 
via Stein's unbiased risk estimate. 

A numerical study shows that MAPI and MAP2 perform 
well at low SNR, but the performance deteriorates at higher 
SNR. While StOMP demonstrates competitive results in [11], 
such is not the case in the simulation study conducted in 
this paper. The SBL estimate is not sparse; despite this, the 
estimates look visually sparse due to many non-zero values 
being small. In the high SNR regime for the LAZE 9, SBL has 
good performance. When the hyperparameters are estimated 
via SURE, the hybrid estimator achieves a sparser estimate 
with lower l p reconstruction error for p = 0,1,2 as compared 
to lasso. In addition, the hybrid estimator has lower detection 
error Ed. The numerical study suggests that sparse estimators 
based on sparse priors may achieve superior performance to 
the lasso. 

The paper did not compare the MAP/ML and SURE esti- 
mates of the hyperparameters to other estimates, e.g., GCV, the 
method of [20] for lasso, etc. This is primarily due to a lack 
of space. In the case when 9 is a linear function of y, SURE 
is equivalent to the C p statistic, while GCV is the C p statistic 
with a 2 replaced by an estimated version [29]. Unfortunately, 



the sparse estimators considered in the paper are all nonlinear 
in y. Another issue that should be looked in future work is how 
to improve MAPI/2 to rectify the deteriorating performance at 
higher SNR. The estimates a, w generally become more biased 
as the SNR increases [27]. This has been noted in [30]. With 
MAP2, the degree of bias is affected by the selection of g*. 

Implementation considerations were not discussed, although 
they are critical in the implementation of a deconvolution 
algorithm. The interested reader is referred to [27]. In terms 
of increasing complexity, the estimators can be approximately 
ordered as: StOMP, LS/oracular LS, MAPI and MAP2, lasso- 
SURE, H-SURE, and SBL. Thanks to LARS, evaluating a 
goodness-of-fit criterion for lasso whether it be a SURE crite- 
rion, a GCV criterion, etc. has low computational complexity. 
Although LARS requires the selection of individual columns 
of H, this is not an issue when H represents a convolution 
operator. The selection can be efficiently implemented using 
the fast Fourier transform (FFT). Solving for the H-SURE 
hyperparameters has higher computational complexity since an 
efficient implementation of the H-SURE estimator is currently 
lacking. In this paper, the iterative thresholding framework is 
used for part of the solution; however, a LARS-like method 
would be a welcomed improvement. 
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Appendix I 
Proofs of Section IV 

A more general result is derived here. Consider the iteration 

(31) 



t +l) = St X _ (<f ) + (a/cr) 2 H'(y - m (n) 

where T(x; () = J2iT(xi', £)§£ is a thresholding rule [15, 
Sec. 2.3] with the following condition. Suppose that 
has threshold t > 0; then, T(-;£) is strictly increasing on 
R\ {-t,t). Note that T^faQls only defined for i^0. 
Extend the definition at x = to get 



Tt(z;C) = 







x = 



T-ifaQ x^0 



(32) 



T^x; is continuous on i £ 1 \ {0}. For the remainder of 
this section, the dependency of T, T 1-1 , and on £ will be 
omitted for the sake of brevity. 



Proposition 2 The function 

Ji(x) = 2T\x)x-x 2 ~2 
is continuous for iel 



(33) 



e=Tt(x) 



Proof. Since T^(x) is continuous in K\ {0}, the only place 
that should be checked is x = 0. The second term in (33) is 
continuous, so it remains to check the first and third terms. By 
definition of a threshold function, T^(0+) = t and T^O") = 
-t. 

Consider e > 0. Since T*{-) is right continuous at + , there 
exists 5i > s.t. x G (0, <5i) implies that |T T (x) —t\< O.lt. 
Likewise, since T'(-) is left continuous at 0~, there exists 



5 2 > s.t. x G ( 



-<J 2) 0) implies that \T*(x) 
1 



t < O.lt. Set 



8 = — rriin((5i , $2, ) 

so that \x\ < 8 =>■ \xT^(x)\ < e. 

Consider the third term. Define A(£) = T(q)dq: since 
T(-) is continuous, so is A(-). Moreover, for \x\ < t, A(x) = 
0. For e > 0, there exists k > t s.t. |£| < k => \A(£)\ < e. 
Since T^(-) is right continuous at + , there exists 81 > 
s.t. x 6 (0, Si) \T^(x) — t\ < K — t. In a similar fashion, 
since T'(-) is left continuous at 0~, there exists 5 2 > s.t. x E 
(-$2,0) =S> |Tt(x)+i| <K-t. Set S = min(6 u 6 2 ). From 
\x\ < 5, one gets |Tt(x)| < k whence |A(rt(a;))| < e. ■ 



Proposition 3 The minimizer of tp(x) 
x = T(c). 



x 2 — 2cx + Ji (x) is 



Proof. Let (pi(x) = x 2 — 2cx: f'i(x) = 2(x — c), and is 
lower bounded. Similarly, consider J\{x): for x 7^ 0, J[(x) = 
2(T+(x) - a;). Since < T(x) < x for all x > and x < 
T{x) < for all x < 0, 



J[(x) = 



> x > 
< x < 



Ji(x) is also lower bounded. Applying Prop. 2 results in ip(x) 
being a continuous, lower bounded fuction. Consider now two 
cases. 

Case 1: |c| > t, where recall that t is the threshold of T(-). 

For i^0, <p'(x) = 2x~2c+ J[(x) = 2 [7%)) - c]. So 
(fi'(x) = iff T'(x) = c, which occurs uniquely at x = 
T(c) > 0. Consider 



ip'(T(c) + 5)=2[T^T(c)+S)-c] 



(34) 



Since we assume that T(-) is strictly increasing on M\ (— t, t), 
T\x) is also strictly increasing for x 7^ 0. For sufficiently 
small S > 0, <p'(T(c) + 8) > and <p'(T(c) - 8) < 0. 
So x = T(c) is a local minimum. At this value of x, 
f{x) = — 2A(c) < 0. To verify that x is the global minimum, 
it is necessary to compute <p(0) = 0. So indeed, x = T{c) 
minimizes (f(x). 

Case 2: |c| < t. Suppose that the minimizer x 7^ 0. Then, 
the analysis in Case 1 applies, resulting in a; = T(c). But since 
|c| < t by assumption, one gets x = 0. This is a contradiction: 
it must therefore be the case that x = 0. ■ 
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Theorem 3 Suppose that 1 1 H 1 1 2 < 1 and a = a. Consider the 
iteration (31), where T(-) is a thresholding rule with threshold 
t > 0, and T(-) is strictly increasing in K\ (— t, t). Then, the 
iterations (31) converge to a stationary point of ^(9), where 



m-v\\i 



where: J (9) = J2 Jl{ 

i=l 



(35) 



Proof. Use the following definitions, which appear in [17]: 



s(S;fi) = C||g-fi||!-||Hfi-Ha|| 



SUR 



^a) = <^{9) + E{9;a) 



(36) 
(37) 



where C is chosen to ensure that E(9; a) is strictly positive and 
convex in 9 for any choice of a. By assumption, \\H\\2 < 1, 
and so select C = 1 [17]. The function $ SUR (0;a) is the 
surrogate function that is minimized in place of $(0). Consider 
the minimization of $ SUR (0; a), which can be simplified as 



$ SUR (0;a) 



2 -2(a + H'(y-Ha))'0 

J(0) + \\yf 



a\\ 2 -\\Ha\\ 2 (38) 



Since J (9) = Ei-M^i). the minimization of $ SUR (0;a) 
can be decomposed into M subproblems, where each 9{ is 
separately minimized. Indeed, each 9i should minimize 



<p(9i)±9i-2s i 9 i + Jx{9 i ), 



(39) 



where s = a + H'(y — Ha), Apply Prop. 3 to get the 
minimizing 9i, i.e., 9% = T(sj). 

Let 9 denote the sequence generated by 



argminS^M 



(40) 



~(0) f.(n) 

where 9 is the initial estimate. Then, 9 is generated 
by (31), where recall that a ja = 1. Any limit point of the 
iterations (31) is a stationary point of (35) [31]. ■ 



Appendix II 
Proofs of Section V 

A. Proof of Thm. 1 

Recall that G(H) = H'H is the Gram matrix of H. In order 
to simplify notation, for A € R MxM , denote by A n = A[l : 
r, 1 : r], A 12 = A[l : r,r+l : M], A 2 i = A[r+1 : M, 1 : r], 
and A22 = A[r+ 1 : M, r+ 1 : M}. The following proposition 
is needed. Its proof is omitted due to a lack of space. 

Proposition 4 If H has linearly independent columns, 

&/((PG(H)P') 22 ) £ 0. (41) 

where P is a matrix that orders the zero and non-zero 
components of 9. 

For 0, an unbiased estimate of the I2 risk (22) is [23], [32] 



R(0 = v 2 



2a 2 



N £j dy n 



N 

E 



(42) 



where e = y — H0. If £ is obtained via a minimization £ = 
argmin e \I>£(#), (42) can be evaluated as [32, (2)] 



R(0 



1LH2 

N 



2a 2 

_f^tr(H(D 6 

N y ~ 



>*c)" 



(43) 



where D„,„(-) = <9 2 (-)/<9u<V. 

Let ^f(,i(9) = ||F£0-y|| 2 + Ci||0||i denote the cost function 
of lasso. Since ^^,i(9_) is not twice differentiable on R M , (43) 
cannot be directly applied. Consider 



2C 

IT 



M 

- {0jarctan( 



a 



a 



ln(l + -|)} (44) 



a 



m—l 

which is twice differentiable on R M . It can be shown that 
lim a ^o ^(.i{9.; a) = ^(,i(9_) pointwise. The minimizer of 
fy(,i(9;a) therefore equals the minimizer of ^^(fl) in the 
limit as a — > 0. Denote by Ri((;a) the unbiased estimate of 
(22) when 9 is obtained by minimizing ^(,1(9; a). As the RHS 
of (42) is solely a function of 9 (recall that y, H, and a 2 are 

known), lim Q „»o Ri(Ci a ) — Ri(C) pointwise. 
Applying (43) , 



Ri{C,a) = a 2 
where 



11-112 

N 



^tr(G(H)[G(H) + ±Z o (0) 



Z„(g) = ^diag(- 



52 • 

M 



(45) 
(46) 



Consider the tr(-) expression in (45). As P is orthogonal and 
matrix multiplication is commutative under the trace operator, 



tr(G(H) 



G(H 



Z«(© 



tr(PG(H)P' 



PG(H)P' 



PZ„(£)P' 



Without loss of generality, suppose that Z a (0) is ordered so 
that 0i = ... = 9 r = 0, where r = M- ||0|| o and 9 m ^ for 
m > r. Let K = PG(H)P'. Then, [K + Z a (9)/2}- 1 equals 



-F22 K 2 iK n 



_ K 11 1 Ki2F 22 1 



where K n = K n + Z (g) u , K 22 = K 22 + Zg(0) 22 , Fn = 
Kn - KuK^Kai, and F 22 = K 22 - K 2 iK^Ki 2 . Kn 
is invertible for sufficiently small a. Likewise, for sufficiently 
small a, K 22 is invertible by Prop. 4. 

As a -> 0, K^ 1 -> and K 22 G(H) 22 . In addition, 
F n J -» and F 22 -> K 22 . So 

n -1 



K 



K 



Za(£) 



K i2 K 



12-lv 22 
I 22 



(47) 



as a — > 0. Consequently, 



lim i?;(C; a) = o"^ 

a— >0 — 



2d, 

N 



Rl(0- (48) 
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B. Proof of Thm. 2 

Earlier notation from this appendix will be retained. The 
proof of the following proposition is omitted due to a lack of 
space. 

Proposition 5 Suppose that H has linearly independent 
columns. If <fef(P[G(H) - ±U(0)]P') = 0, then G(H) has 
an eigenvalue of 1/2. 

The proof of Thm. 2 parallels the proof of Thm. 1. As ^ *y(£) 
is not twice differentiable on Ti, M , consider instead 

a) - #£,j(g;a) 

M 

+ ^ [Gi (0 m - A c ; a) - Gi (0 TO + A c ; a)} (49) 

m— 1 

where 

_ A (a 2 + x 2 ) arctan(x/a) — ax 1 2 
Gi(x\a) = ± '- + -x 2 . (50) 

Z7T 4 

^C,Ay(^i a ) * s twice differentiable in f\, M and 
lim a ^o *c,'iv(#;a) = pointwise. Result (43) 

can be applied to get 

i? /lv (C; a) = a 2 + 

+ ^tr(G(H)[G(H) + iz o (D - ^U^)]- 1 ) (51) 

with 

U„(g) ^diag((Gi(0 m -A c ;a)-Gi(0 m +A f ;a))£f =1 ) (52) 

Notice that similarity between j(0;a) and 'I/V/™^; a); the 
same applies to R[((^;a) and -R/, v (£;a). The steps of Thm. 1 
can be carried out to evaluate the tr(-) expression in (51) as 
a — > 0. One arrives at 

limtrfK 22 [K 22 - ^(PU^)),,]- 1 ) . (53) 

a— »o y 2 J 

Now linia^o U a (0) = U(0). By assumption, G(H) does not 
have an eigenvalue of 1/2. Therefore, application of Prop. 5 
implies that the inverse in (53) exists. 



